
Maven Communications is a California-based Telecommunications company, we would be assuming the role of a Business Intelligence(BI) Consultant.
Objective - The objective of the case study is to help the company - Maven Communications to improve retention by identifying high value customers and churn risks, and have been asked to present your findings to the CMO in the form of a single page report or dashboard.
Methods - We will be using Python for this case study and Pandas package will helps us in manipulating the data as per our requirement, with this we will also be using the Holoviz, which is a collection of multiple different high level python plotting libraries.
We will be performing data analysis step-by-step starting with the:
STAGE-A
1. Prepare Phase - We will collect the data and make it appropriate for the futher stages.
2. Process Phase - A deep level of data cleaning and wrangling.
STAGE-B
3. Analysis Phase - In this we will be exploring the dataset and understanding the better way to solve some of the questions.
4. Share Phase - Making our analysis capable enough for any one to understand.
5. Act Phase - Suggestion and implementation on the basis of the analysis done in the above phase.
#Loading all the packages needed for this analysis
#pandas for data wrangling/preparation
import pandas as pd
#panel for dashboard
import panel as pn
pn.extension(sizing_mode = 'stretch_width')
#collection of different tools
import holoviews as hv
hv.extension('bokeh')
#Plotting api
import hvplot.pandas
#geoviews for geo plot
import geoviews as gv
from geoviews import opts, tile_sources as gvts
import cartopy.crs as ccrs
#to ignore warnings
import warnings
warnings.filterwarnings("ignore")
#operating system dependent functionality
import os
#looking for current working directory
directory = os.getcwd()
directory
'/home/john/Downloads/Maven-Telecom-Analysis-main'
#Reading the customer data
tele_cust_df = pd.read_csv(f'{directory}/Data/telecom_customer_churn.csv')
#Setting Max no. of columns to view the complete data
pd.set_option('display.max_columns', 38)
#Head part of the Customer Data
tele_cust_df.head()
| Customer ID | Gender | Age | Married | Number of Dependents | City | Zip Code | Latitude | Longitude | Number of Referrals | Tenure in Months | Offer | Phone Service | Avg Monthly Long Distance Charges | Multiple Lines | Internet Service | Internet Type | Avg Monthly GB Download | Online Security | Online Backup | Device Protection Plan | Premium Tech Support | Streaming TV | Streaming Movies | Streaming Music | Unlimited Data | Contract | Paperless Billing | Payment Method | Monthly Charge | Total Charges | Total Refunds | Total Extra Data Charges | Total Long Distance Charges | Total Revenue | Customer Status | Churn Category | Churn Reason | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0002-ORFBO | Female | 37 | Yes | 0 | Frazier Park | 93225 | 34.827662 | -118.999073 | 2 | 9 | None | Yes | 42.39 | No | Yes | Cable | 16.0 | No | Yes | No | Yes | Yes | No | No | Yes | One Year | Yes | Credit Card | 65.6 | 593.30 | 0.00 | 0 | 381.51 | 974.81 | Stayed | NaN | NaN |
| 1 | 0003-MKNFE | Male | 46 | No | 0 | Glendale | 91206 | 34.162515 | -118.203869 | 0 | 9 | None | Yes | 10.69 | Yes | Yes | Cable | 10.0 | No | No | No | No | No | Yes | Yes | No | Month-to-Month | No | Credit Card | -4.0 | 542.40 | 38.33 | 10 | 96.21 | 610.28 | Stayed | NaN | NaN |
| 2 | 0004-TLHLJ | Male | 50 | No | 0 | Costa Mesa | 92627 | 33.645672 | -117.922613 | 0 | 4 | Offer E | Yes | 33.65 | No | Yes | Fiber Optic | 30.0 | No | No | Yes | No | No | No | No | Yes | Month-to-Month | Yes | Bank Withdrawal | 73.9 | 280.85 | 0.00 | 0 | 134.60 | 415.45 | Churned | Competitor | Competitor had better devices |
| 3 | 0011-IGKFF | Male | 78 | Yes | 0 | Martinez | 94553 | 38.014457 | -122.115432 | 1 | 13 | Offer D | Yes | 27.82 | No | Yes | Fiber Optic | 4.0 | No | Yes | Yes | No | Yes | Yes | No | Yes | Month-to-Month | Yes | Bank Withdrawal | 98.0 | 1237.85 | 0.00 | 0 | 361.66 | 1599.51 | Churned | Dissatisfaction | Product dissatisfaction |
| 4 | 0013-EXCHZ | Female | 75 | Yes | 0 | Camarillo | 93010 | 34.227846 | -119.079903 | 3 | 3 | None | Yes | 7.38 | No | Yes | Fiber Optic | 11.0 | No | No | No | Yes | Yes | No | No | Yes | Month-to-Month | Yes | Credit Card | 83.9 | 267.40 | 0.00 | 0 | 22.14 | 289.54 | Churned | Dissatisfaction | Network reliability |
# Info of our columns
tele_cust_df.info(memory_usage = 'deep')
<class 'pandas.core.frame.DataFrame'> RangeIndex: 7043 entries, 0 to 7042 Data columns (total 38 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Customer ID 7043 non-null object 1 Gender 7043 non-null object 2 Age 7043 non-null int64 3 Married 7043 non-null object 4 Number of Dependents 7043 non-null int64 5 City 7043 non-null object 6 Zip Code 7043 non-null int64 7 Latitude 7043 non-null float64 8 Longitude 7043 non-null float64 9 Number of Referrals 7043 non-null int64 10 Tenure in Months 7043 non-null int64 11 Offer 7043 non-null object 12 Phone Service 7043 non-null object 13 Avg Monthly Long Distance Charges 6361 non-null float64 14 Multiple Lines 6361 non-null object 15 Internet Service 7043 non-null object 16 Internet Type 5517 non-null object 17 Avg Monthly GB Download 5517 non-null float64 18 Online Security 5517 non-null object 19 Online Backup 5517 non-null object 20 Device Protection Plan 5517 non-null object 21 Premium Tech Support 5517 non-null object 22 Streaming TV 5517 non-null object 23 Streaming Movies 5517 non-null object 24 Streaming Music 5517 non-null object 25 Unlimited Data 5517 non-null object 26 Contract 7043 non-null object 27 Paperless Billing 7043 non-null object 28 Payment Method 7043 non-null object 29 Monthly Charge 7043 non-null float64 30 Total Charges 7043 non-null float64 31 Total Refunds 7043 non-null float64 32 Total Extra Data Charges 7043 non-null int64 33 Total Long Distance Charges 7043 non-null float64 34 Total Revenue 7043 non-null float64 35 Customer Status 7043 non-null object 36 Churn Category 1869 non-null object 37 Churn Reason 1869 non-null object dtypes: float64(9), int64(6), object(23) memory usage: 9.7 MB
#Checking null values in all the columns.
tele_cust_df.isna().sum()
Customer ID 0 Gender 0 Age 0 Married 0 Number of Dependents 0 City 0 Zip Code 0 Latitude 0 Longitude 0 Number of Referrals 0 Tenure in Months 0 Offer 0 Phone Service 0 Avg Monthly Long Distance Charges 682 Multiple Lines 682 Internet Service 0 Internet Type 1526 Avg Monthly GB Download 1526 Online Security 1526 Online Backup 1526 Device Protection Plan 1526 Premium Tech Support 1526 Streaming TV 1526 Streaming Movies 1526 Streaming Music 1526 Unlimited Data 1526 Contract 0 Paperless Billing 0 Payment Method 0 Monthly Charge 0 Total Charges 0 Total Refunds 0 Total Extra Data Charges 0 Total Long Distance Charges 0 Total Revenue 0 Customer Status 0 Churn Category 5174 Churn Reason 5174 dtype: int64
# Checking whether the NaN values are 'No' or 'None' as per the Data Dictionary.
tele_cust_df.loc[tele_cust_df['Phone Service'] == 'No', ['Avg Monthly Long Distance Charges', 'Multiple Lines']]
| Avg Monthly Long Distance Charges | Multiple Lines | |
|---|---|---|
| 10 | NaN | NaN |
| 14 | NaN | NaN |
| 16 | NaN | NaN |
| 19 | NaN | NaN |
| 25 | NaN | NaN |
| ... | ... | ... |
| 6979 | NaN | NaN |
| 6980 | NaN | NaN |
| 6996 | NaN | NaN |
| 7016 | NaN | NaN |
| 7042 | NaN | NaN |
682 rows × 2 columns
# Checking whether the NaN values are 'No' or 'None' as per the Data Dictionary.
tele_cust_df.loc[tele_cust_df['Internet Service'] == 'No', tele_cust_df.columns[16:26]]
| Internet Type | Avg Monthly GB Download | Online Security | Online Backup | Device Protection Plan | Premium Tech Support | Streaming TV | Streaming Movies | Streaming Music | Unlimited Data | |
|---|---|---|---|---|---|---|---|---|---|---|
| 20 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 23 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 24 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 27 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 28 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 7026 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 7028 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 7032 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 7033 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 7037 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1526 rows × 10 columns
# custmer segregation
tele_cust_df['Customer Status'].value_counts()
Stayed 4720 Churned 1869 Joined 454 Name: Customer Status, dtype: int64
After going through all the null values and the data dictionary, I have understood the reason behind the null values.
All the missing or null values need to be replaced with 0, No and None.
#Replacing NaN values with appropriate values
tele_cust_df.fillna({"Avg Monthly Long Distance Charges": 0,
"Multiple Lines": 'No',
"Internet Type": 'None',
"Avg Monthly GB Download": 0,
"Online Security": 'No',
"Online Backup": 'No',
"Device Protection Plan": 'No',
"Premium Tech Support": 'No',
"Streaming TV": 'No',
"Streaming Movies": 'No',
"Streaming Music": 'No',
"Unlimited Data": 'No',
"Churn Category": 'None',
"Churn Reason": 'None'}, inplace = True)
# There should be 0 Nan values
tele_cust_df.isna().sum()
Customer ID 0 Gender 0 Age 0 Married 0 Number of Dependents 0 City 0 Zip Code 0 Latitude 0 Longitude 0 Number of Referrals 0 Tenure in Months 0 Offer 0 Phone Service 0 Avg Monthly Long Distance Charges 0 Multiple Lines 0 Internet Service 0 Internet Type 0 Avg Monthly GB Download 0 Online Security 0 Online Backup 0 Device Protection Plan 0 Premium Tech Support 0 Streaming TV 0 Streaming Movies 0 Streaming Music 0 Unlimited Data 0 Contract 0 Paperless Billing 0 Payment Method 0 Monthly Charge 0 Total Charges 0 Total Refunds 0 Total Extra Data Charges 0 Total Long Distance Charges 0 Total Revenue 0 Customer Status 0 Churn Category 0 Churn Reason 0 dtype: int64
All the missing values are replaced and the data is now complete
#checking for any duplicate values
tele_cust_df.duplicated().sum()
0
0 indicates no duplicate value
#changing the data types of the columns
col_list = tele_cust_df.columns
col_list = col_list[[1, 3, 11, 12, 14, 15, 16, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 35, 36, 37]].to_list()
tele_cust_df[col_list] = tele_cust_df[col_list].astype('category')
tele_cust_df.info(memory_usage = 'deep')
<class 'pandas.core.frame.DataFrame'> RangeIndex: 7043 entries, 0 to 7042 Data columns (total 38 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Customer ID 7043 non-null object 1 Gender 7043 non-null category 2 Age 7043 non-null int64 3 Married 7043 non-null category 4 Number of Dependents 7043 non-null int64 5 City 7043 non-null object 6 Zip Code 7043 non-null int64 7 Latitude 7043 non-null float64 8 Longitude 7043 non-null float64 9 Number of Referrals 7043 non-null int64 10 Tenure in Months 7043 non-null int64 11 Offer 7043 non-null category 12 Phone Service 7043 non-null category 13 Avg Monthly Long Distance Charges 7043 non-null float64 14 Multiple Lines 7043 non-null category 15 Internet Service 7043 non-null category 16 Internet Type 7043 non-null category 17 Avg Monthly GB Download 7043 non-null float64 18 Online Security 7043 non-null category 19 Online Backup 7043 non-null category 20 Device Protection Plan 7043 non-null category 21 Premium Tech Support 7043 non-null category 22 Streaming TV 7043 non-null category 23 Streaming Movies 7043 non-null category 24 Streaming Music 7043 non-null category 25 Unlimited Data 7043 non-null category 26 Contract 7043 non-null category 27 Paperless Billing 7043 non-null category 28 Payment Method 7043 non-null category 29 Monthly Charge 7043 non-null float64 30 Total Charges 7043 non-null float64 31 Total Refunds 7043 non-null float64 32 Total Extra Data Charges 7043 non-null int64 33 Total Long Distance Charges 7043 non-null float64 34 Total Revenue 7043 non-null float64 35 Customer Status 7043 non-null category 36 Churn Category 7043 non-null category 37 Churn Reason 7043 non-null category dtypes: category(21), float64(9), int64(6), object(2) memory usage: 1.8 MB
After changing the datatypes to category the memory usage is dropped from ~10 MiB to ~1 MiB
With this Importing and wrangling is done.
#descriptive statistics of the dataframe
tele_cust_df.describe()
| Age | Number of Dependents | Zip Code | Latitude | Longitude | Number of Referrals | Tenure in Months | Avg Monthly Long Distance Charges | Avg Monthly GB Download | Monthly Charge | Total Charges | Total Refunds | Total Extra Data Charges | Total Long Distance Charges | Total Revenue | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 7043.000000 | 7043.000000 | 7043.000000 | 7043.000000 | 7043.000000 | 7043.000000 | 7043.000000 | 7043.000000 | 7043.000000 | 7043.000000 | 7043.000000 | 7043.000000 | 7043.000000 | 7043.000000 | 7043.000000 |
| mean | 46.509726 | 0.468692 | 93486.070567 | 36.197455 | -119.756684 | 1.951867 | 32.386767 | 22.958954 | 20.515405 | 63.596131 | 2280.381264 | 1.962182 | 6.860713 | 749.099262 | 3034.379056 |
| std | 16.750352 | 0.962802 | 1856.767505 | 2.468929 | 2.154425 | 3.001199 | 24.542061 | 15.448113 | 20.418940 | 31.204743 | 2266.220462 | 7.902614 | 25.104978 | 846.660055 | 2865.204542 |
| min | 19.000000 | 0.000000 | 90001.000000 | 32.555828 | -124.301372 | 0.000000 | 1.000000 | 0.000000 | 0.000000 | -10.000000 | 18.800000 | 0.000000 | 0.000000 | 0.000000 | 21.360000 |
| 25% | 32.000000 | 0.000000 | 92101.000000 | 33.990646 | -121.788090 | 0.000000 | 9.000000 | 9.210000 | 3.000000 | 30.400000 | 400.150000 | 0.000000 | 0.000000 | 70.545000 | 605.610000 |
| 50% | 46.000000 | 0.000000 | 93518.000000 | 36.205465 | -119.595293 | 0.000000 | 29.000000 | 22.890000 | 17.000000 | 70.050000 | 1394.550000 | 0.000000 | 0.000000 | 401.440000 | 2108.640000 |
| 75% | 60.000000 | 0.000000 | 95329.000000 | 38.161321 | -117.969795 | 3.000000 | 55.000000 | 36.395000 | 27.000000 | 89.750000 | 3786.600000 | 0.000000 | 0.000000 | 1191.100000 | 4801.145000 |
| max | 80.000000 | 9.000000 | 96150.000000 | 41.962127 | -114.192901 | 11.000000 | 72.000000 | 49.990000 | 85.000000 | 118.750000 | 8684.800000 | 49.790000 | 150.000000 | 3564.720000 | 11979.340000 |
#Radio Button Widget to control the plots
select_widget = pn.widgets.Select(name = 'Select to apply',
options = ['Total Revenue(10e3)',
'Total Refunds',
'Total Customer'],
value = 'Total Revenue(10e3)')
select_widget
#Checking whether the widget is working or not.
def display_1(a):
return f'Radio Button Value: {a}'
pn.Column(
select_widget,
pn.bind(display_1, a = select_widget)
)
#Grouping the data on city and renaming the columns.
cust_rev_df = tele_cust_df.groupby('City', as_index = False)\
.agg({'Latitude': 'mean',
'Longitude': 'mean',
'Customer ID': 'count',
'Total Revenue': 'sum',
'Total Refunds': 'sum'})
cust_rev_df['Total Revenue'] = (cust_rev_df['Total Revenue'] / 1000)
cust_rev_df.rename(columns = {'Customer ID': 'Total Customer', 'Total Revenue': 'Total Revenue(10e3)'},
inplace = True)
cust_rev_df
| City | Latitude | Longitude | Total Customer | Total Revenue(10e3) | Total Refunds | |
|---|---|---|---|---|---|---|
| 0 | Acampo | 38.200231 | -121.235034 | 4 | 18.10796 | 1.27 |
| 1 | Acton | 34.501452 | -118.207862 | 4 | 12.15636 | 0.00 |
| 2 | Adelanto | 34.667815 | -117.536183 | 5 | 18.23549 | 48.23 |
| 3 | Adin | 41.171578 | -120.913161 | 4 | 5.53938 | 0.00 |
| 4 | Agoura Hills | 34.129058 | -118.759788 | 5 | 10.64188 | 0.00 |
| ... | ... | ... | ... | ... | ... | ... |
| 1101 | Yreka | 41.764869 | -122.671316 | 4 | 17.46767 | 0.00 |
| 1102 | Yuba City | 39.051552 | -121.660521 | 8 | 17.86782 | 123.31 |
| 1103 | Yucaipa | 34.045970 | -117.011825 | 4 | 29.76580 | 0.00 |
| 1104 | Yucca Valley | 34.159534 | -116.425984 | 5 | 12.37469 | 40.41 |
| 1105 | Zenia | 40.170357 | -123.417298 | 4 | 12.73338 | 0.00 |
1106 rows × 6 columns
#creating geo plot and table and this can be controlled by the widget
def cust_rev_geoplot(select_widget):
return cust_rev_df.sort_values(select_widget, ascending = True)\
.hvplot.points('Longitude',
'Latitude',
xaxis = None,
yaxis = None,
geo=True,
tiles = 'OSM',
frame_width = 550,
color = select_widget,
cmap = 'plasma',
size = select_widget,
hover_cols = ['City'],
clabel = select_widget,
title = f'{select_widget} as per Cities')
cust_rev_bind = pn.bind(cust_rev_geoplot, select_widget)
pn.Column(select_widget, cust_rev_bind)
Note: Cities that have the highest revenue and the most customer are Los Angeles, San Diego, Sacramento, San Jose and San Francisco
# Creating a new column 'Tenure' for a ordinal category.
tele_cust_df['Tenure'] = pd.cut(tele_cust_df['Tenure in Months'],
bins = [0, 12, 24, 36, 48, 60, 72],
labels = ['Less than 1 Year', '1 - 2 Years', '2 - 3 Years', '3 - 4 Years',
'4 - 5 Years', '5 - 6 Years'])
tele_cust_df['Tenure'].unique()
['Less than 1 Year', '1 - 2 Years', '5 - 6 Years', '4 - 5 Years', '2 - 3 Years', '3 - 4 Years'] Categories (6, object): ['Less than 1 Year' < '1 - 2 Years' < '2 - 3 Years' < '3 - 4 Years' < '4 - 5 Years' < '5 - 6 Years']
#Grouping on Tenure and Customer Status and renaming the columns
rev_refund_df_1 = tele_cust_df.groupby(['Tenure', 'Customer Status'])\
.agg({'Customer ID': 'count',
'Total Revenue': 'sum',
'Total Refunds': 'sum'}).reset_index()
rev_refund_df_1['Total Revenue'] = rev_refund_df_1['Total Revenue'] / 1000
rev_refund_df_1.rename(columns = {'Customer ID': 'Total Customer', 'Total Revenue': 'Total Revenue(10e3)'},
inplace = True)
rev_refund_df_1
| Tenure | Customer Status | Total Customer | Total Revenue(10e3) | Total Refunds | |
|---|---|---|---|---|---|
| 0 | Less than 1 Year | Churned | 1037 | 397.41770 | 1033.93 |
| 1 | Less than 1 Year | Joined | 454 | 54.27975 | 108.35 |
| 2 | Less than 1 Year | Stayed | 695 | 396.26792 | 990.50 |
| 3 | 1 - 2 Years | Churned | 294 | 524.74053 | 708.75 |
| 4 | 1 - 2 Years | Joined | 0 | 0.00000 | 0.00 |
| 5 | 1 - 2 Years | Stayed | 730 | 1072.81979 | 1712.10 |
| 6 | 2 - 3 Years | Churned | 180 | 585.68371 | 432.36 |
| 7 | 2 - 3 Years | Joined | 0 | 0.00000 | 0.00 |
| 8 | 2 - 3 Years | Stayed | 652 | 1645.92039 | 1667.24 |
| 9 | 3 - 4 Years | Churned | 145 | 659.78713 | 332.90 |
| 10 | 3 - 4 Years | Joined | 0 | 0.00000 | 0.00 |
| 11 | 3 - 4 Years | Stayed | 617 | 2218.43720 | 1762.97 |
| 12 | 4 - 5 Years | Churned | 120 | 742.74710 | 217.84 |
| 13 | 4 - 5 Years | Joined | 0 | 0.00000 | 0.00 |
| 14 | 4 - 5 Years | Stayed | 712 | 3569.17219 | 1697.60 |
| 15 | 5 - 6 Years | Churned | 93 | 774.08365 | 113.20 |
| 16 | 5 - 6 Years | Joined | 0 | 0.00000 | 0.00 |
| 17 | 5 - 6 Years | Stayed | 1314 | 8729.77463 | 3041.91 |
#Grouping on Tenure and Contract and renaming the columns
rev_refund_df_2 = tele_cust_df.groupby(['Tenure', 'Contract'])\
.agg({'Customer ID': 'count',
'Total Revenue': 'sum',
'Total Refunds': 'sum'}).reset_index()
rev_refund_df_2['Total Revenue'] = rev_refund_df_2['Total Revenue'] / 1000
rev_refund_df_2.rename(columns = {'Customer ID': 'Total Customer', 'Total Revenue': 'Total Revenue(10e3)'},
inplace = True)
rev_refund_df_2
| Tenure | Contract | Total Customer | Total Revenue(10e3) | Total Refunds | |
|---|---|---|---|---|---|
| 0 | Less than 1 Year | Month-to-Month | 1913 | 719.67822 | 1695.31 |
| 1 | Less than 1 Year | One Year | 146 | 69.72779 | 271.63 |
| 2 | Less than 1 Year | Two Year | 127 | 58.55936 | 165.84 |
| 3 | 1 - 2 Years | Month-to-Month | 686 | 1141.09855 | 1561.33 |
| 4 | 1 - 2 Years | One Year | 215 | 301.83451 | 539.23 |
| 5 | 1 - 2 Years | Two Year | 123 | 154.62726 | 320.29 |
| 6 | 2 - 3 Years | Month-to-Month | 433 | 1261.29955 | 1202.25 |
| 7 | 2 - 3 Years | One Year | 268 | 676.77707 | 357.64 |
| 8 | 2 - 3 Years | Two Year | 131 | 293.52748 | 539.71 |
| 9 | 3 - 4 Years | Month-to-Month | 282 | 1184.82725 | 706.99 |
| 10 | 3 - 4 Years | One Year | 278 | 1055.14521 | 738.98 |
| 11 | 3 - 4 Years | Two Year | 202 | 638.25187 | 649.90 |
| 12 | 4 - 5 Years | Month-to-Month | 209 | 1191.69267 | 474.09 |
| 13 | 4 - 5 Years | One Year | 326 | 1787.33850 | 703.57 |
| 14 | 4 - 5 Years | Two Year | 297 | 1332.88812 | 737.78 |
| 15 | 5 - 6 Years | Month-to-Month | 87 | 663.89198 | 121.62 |
| 16 | 5 - 6 Years | One Year | 317 | 2280.97123 | 679.28 |
| 17 | 5 - 6 Years | Two Year | 1003 | 6558.99507 | 2354.21 |
#Ploting two horizontal bar, one for customer status and the other for contract.
#Both can be controlled by the Radio Button widget
def rev_refund_plot_1(select_widget):
return rev_refund_df_1.hvplot.barh(x = 'Tenure',
y = select_widget,
by = 'Customer Status',
legend = 'top',
cmap = 'Dark2',
stacked = True,
height = 400,
title = f'{select_widget} vs. Tenure as per the Customer Status')
def rev_refund_plot_2(select_widget):
return rev_refund_df_2.hvplot.barh(x = 'Tenure',
y = select_widget,
by = 'Contract',
legend = 'top',
cmap = 'Set1',
stacked = True,
height = 400,
title = f'{select_widget} vs. Tenure as per the Contract')
rev_refund_bind_1 = pn.bind(rev_refund_plot_1, select_widget)
rev_refund_bind_2 = pn.bind(rev_refund_plot_2, select_widget)
pn.Column(select_widget, pn.Row(rev_refund_bind_1), pn.Row(rev_refund_bind_2))
Note:
# Grouping on churn category and churn reason(similar process as above)
churn_count_df = tele_cust_df.groupby(['Churn Category', 'Churn Reason'])\
.agg({'Customer ID': 'count',
'Total Revenue': 'sum',
'Total Refunds': 'sum'}).reset_index()
churn_count_df['Total Revenue'] = churn_count_df['Total Revenue'] / 1000
churn_count_df.rename(columns = {'Customer ID': 'Total Customer', 'Total Revenue': 'Total Revenue(10e3)'},
inplace = True)
churn_count_df
| Churn Category | Churn Reason | Total Customer | Total Revenue(10e3) | Total Refunds | |
|---|---|---|---|---|---|
| 0 | Attitude | Attitude of service provider | 94 | 174.85874 | 117.54 |
| 1 | Attitude | Attitude of support person | 220 | 404.69539 | 202.04 |
| 2 | Attitude | Competitor had better devices | 0 | 0.00000 | 0.00 |
| 3 | Attitude | Competitor made better offer | 0 | 0.00000 | 0.00 |
| 4 | Attitude | Competitor offered higher download speeds | 0 | 0.00000 | 0.00 |
| ... | ... | ... | ... | ... | ... |
| 121 | Price | Poor expertise of online support | 0 | 0.00000 | 0.00 |
| 122 | Price | Poor expertise of phone support | 0 | 0.00000 | 0.00 |
| 123 | Price | Price too high | 78 | 170.17043 | 183.37 |
| 124 | Price | Product dissatisfaction | 0 | 0.00000 | 0.00 |
| 125 | Price | Service dissatisfaction | 0 | 0.00000 | 0.00 |
126 rows × 5 columns
# Droping out all the zero values and also the 'None' category and reason values
churn_count_df = churn_count_df.loc[(churn_count_df['Total Customer'] != 0) |
(churn_count_df['Total Revenue(10e3)'] != 0) |
(churn_count_df['Total Refunds'] != 0)].drop(78).reset_index(drop = True)
churn_count_df
| Churn Category | Churn Reason | Total Customer | Total Revenue(10e3) | Total Refunds | |
|---|---|---|---|---|---|
| 0 | Attitude | Attitude of service provider | 94 | 174.85874 | 117.54 |
| 1 | Attitude | Attitude of support person | 220 | 404.69539 | 202.04 |
| 2 | Competitor | Competitor had better devices | 313 | 579.87705 | 485.84 |
| 3 | Competitor | Competitor made better offer | 311 | 582.82220 | 396.38 |
| 4 | Competitor | Competitor offered higher download speeds | 100 | 234.14569 | 227.94 |
| 5 | Competitor | Competitor offered more data | 117 | 297.56821 | 301.39 |
| 6 | Dissatisfaction | Lack of self-service on Website | 29 | 36.78815 | 35.34 |
| 7 | Dissatisfaction | Limited range of services | 37 | 47.25443 | 18.39 |
| 8 | Dissatisfaction | Network reliability | 72 | 132.44367 | 176.75 |
| 9 | Dissatisfaction | Poor expertise of online support | 31 | 56.61714 | 93.98 |
| 10 | Dissatisfaction | Poor expertise of phone support | 12 | 17.08599 | 85.68 |
| 11 | Dissatisfaction | Product dissatisfaction | 77 | 204.87574 | 100.21 |
| 12 | Dissatisfaction | Service dissatisfaction | 63 | 122.91414 | 131.19 |
| 13 | Other | Deceased | 6 | 4.17283 | 0.00 |
| 14 | Other | Don't know | 130 | 256.40335 | 146.13 |
| 15 | Other | Moved | 46 | 93.81296 | 0.00 |
| 16 | Price | Extra data charges | 39 | 104.15272 | 90.71 |
| 17 | Price | Lack of affordable download/upload speed | 30 | 66.77276 | 11.05 |
| 18 | Price | Long distance charges | 64 | 97.02823 | 35.05 |
| 19 | Price | Price too high | 78 | 170.17043 | 183.37 |
# Plotting a horizontal bar that shows widgets vs churn reason
def churn_count_plot(select_widget):
return churn_count_df.sort_values(select_widget)\
.hvplot.barh(x = 'Churn Reason',
y = select_widget,
by = 'Churn Category',
cmap = 'Dark2',
stacked = True,
title = f'{select_widget} vs. Churn Reason as per the Churn Category',
height = 500,
legend = 'bottom_right',
ylim = (0, 600))
churn_count_bind = pn.bind(churn_count_plot, select_widget)
pn.Column(select_widget, churn_count_bind)
Panel is capable to generate various types of dashboards using notebook. There are multiple different templates which are ready to use with less than 10 lines of codes but running the template will need a server, which will take us to a browser that breaks the workflow of our notebook. Template is good for the final phase of the development, but for now we will embed it in this notebook to get the feel of a dashboard.
# Dashboard title
dashboard_title = '# Maven Communications - Customer Churn Dashboard'
# Dashboard Description
dashboard_desc = """This dashboard displays the case study done on Maven Communications customer dataset,
to understand the fundamental behavioral pattern of all the customers and help the Marketing team to make
a strategy on "Customer Retention"."""
# Compiling the dashboard with all the above plots and widget to control the plots
dashboard = pn.Column(dashboard_title,
dashboard_desc,
select_widget,
cust_rev_bind,
rev_refund_bind_1,
rev_refund_bind_2,
churn_count_bind
)
# Embedding and displaying the dashboard
dashboard.embed();